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Abstract 

We present a common framework for Bayesian emulation methodologies for multivariate- 
output simulators, or computer models, that employ either parametric linear models or 
nonparametric Gaussian processes. Novel diagnostics suitable for multivariate covariance- 
separable emulators are developed and techniques to improve the adequacy of an emulator 
are discussed and implemented. A variety of emulators are compared for a humanitarian 
relief simulator, modelling aid missions to Sicily after a volcanic eruption and earthquake, 
and a sensitivity analysis is conducted to determine the sensitivity of the simulator out¬ 
put to changes in the input variables. The results from parametric and nonparametric 
emulators are compared in terms of prediction accuracy, uncertainty quantification and 
scientific interpretability. 
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1 Introduction 


There are many systems in the physical, social and engineering sciences for which physical 
experimentation is infeasible or unaffordable. Some examples include investigations on 


engineer to develop a simulator, or computer model, that provides an approximation of the 
observed response from the physical system. In essence, the simulator is a deterministic 
or stochastic mathematical function that maps the inputs of a system to a prediction of 
its outputs. 


ecosystems, infectious diseases, climate change, and galaxy formation (see Kennedy et al. 


2006, for a number of case studies). In such situations, it is now common for the scientist or 


A simulator that has been successfully calibrated and validated, perhaps using physical 
data, can be employed for a number of tasks including prediction, optimisation, and sensi¬ 
tivity and uncertainty analyses (Kennedy and O’Hagan, 2001|). However, both calibrating 
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and exploiting the simnlator typically reqnires very many simnlator evaluations. For com¬ 
plex problems, the computational expense of the simulator means brute-force approaches 
to these problems are infeasible, taking many hours, days or even weeks. Therefore, a 
fundamental step in understanding and using simulators is often the construction of a 


statistical emulator, or meta-model, through a computer experiment (Sacks et ah, 1989). 


Here, the simulator is run at a carefully selected collection of combinations of the input 
variables and the resulting evaluations are treated as data to which a statistical model, 
the emulator, is fitted. The emulator can then be used to produce fast predictions of the 
output of the simulator for any values of the input variables, along with an associated 
measure of the prediction uncertainty. The emulator can then replace and supplement 
the simulator in both statistical calibration and scientific investigation. For more on com¬ 


( 2010 ). 


puter experiments, see Santner et ah (2003), Fang et al. (2006), and Levy and Steinberg 


A Bayesian approach is very natural when constructing statistical emulators (O’Hagan 


2006) with the chosen statistical model treated as a prior distribution on the simulator 


outputs and prediction, with associated uncertainty quant ideation, via the posterior pre¬ 
dictive distribution (see Section 1.2). Typically, a nonparametric Gaussian process (GP) 


regression model (Rasmussen and Williams, 2006) is employed; its advantages include 


flexibly adapting to the simulator evaluations and, for deterministic simulators, inter¬ 
polating between data points. However, for some simulators, these advantages may be 
more than offset by the computational expense of estimating the GP model, and simpler 
and more computationally efficient models, such as multivariate linear regression, may 
be effective and more interpretable. Whatever statistical approach is taken to construct¬ 
ing the emulator, an important step is assessing its adequacy through formal statistical 


diagnostics (Bastos and O’Hagan, 2009). 


Frequently, each run of a simulator outputs a multivariate response, perhaps as a result 
of a time series or other dynamic process. The purpose of this paper is to present a 
Bayesian framework for covariance-separable emulation of multivariate simulators using 
parametric and nonparametric models and to develop novel model diagnostic procedures 
appropriate for such emulators. As part of our presentation, we unify the multivariate 


Gaussian process emulator of Gonti and O’Hagan (2010) and the lightweight emulator of 


Rougier (2007). Through an application to a simulator of a humanitarian relief mission. 


we demonstrate effective emulation, model selection and model checking for multivariate 
problems with a mixture of continuous and categorical input variables. 


1.1 A humanitarian relief simulator with multivariate dynamic output 


Simulators have a long history of use in military and civilian emergency planning (see. 


for example. 

Ingber et al. 

1991 

). DIAMOND (Diplomatic And Military Operations in a 

Non-warhghting Domain; 

Taylor and Lane 

2004) is an emergency planning simulator for 


modelling peace support operations such as humanitarian relief and peace keeping. DIA¬ 
MOND is mission-based, with high-level operational plans deconstructed into missions for 
individual units. It is able to model the actions and interactions between a wide range of 
agents, including military forces in non-warfighting roles, non-governmental organisations 
(NGOs), indigenous forces and civilians. A range of environmental and infrastructure 
features can also be varied. 

Our application of DIAMOND provides a deterministic model of a humanitarian relief 
mission to Sicily after an earthquake and subsequent eruption of Mount Etna. Etna is an 
active stratovolcano on the east coast of Sicily near the cities of Gatania and Giarre (see 
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Figure 1: Map of Sicily showing the locations of Mount Etna, Giarre, Catania, a possible 
humanitarian task-force base, and the capital city Palermo 


Figure]^. It has been designated a “Decade Volcano” by the International Association 
of Volcanology and Chemistry of the Earth’s Interior and the United Nations due to its 
history of large eruptions and proximity to populated areas. Historically, more fatalities 
have been caused by earthquakes in the region, such as in 1693 when an earthquake of 
estimated magnitude 7.4 on the moment magnitude scale devastated the area and caused 
about 12,000 deaths in Catania (~ 63% of the population at the time; Guidoboni et al. 


2007). 


The simulator models damage to the food supply, hospitals and housing (shelter) in 
Giarre and Catania resulting from the earthquake and eruption. An NGO launches a 
humanitarian relief operation which has two missions: 


1. Food Aid Mission 

To supply food to Catania and Giarre by helicopter from the NGO base. 

2. Repair Mission 

To transport engineers from the NGO base to Giarre and Catania, where they repair 
the food supply infrastructure and/or the shelter. 

We consider a scenario designed by the UK Defence Science and Technology Laboratory 
for the explicit and sole aim of model-testing; the scenario is not intended to support any 
real world decisions. Here, the NGO has four helicopter teams, two engineering teams 
and a single food depot. Two helicopter teams are assigned to the food aid mission and 
the others to transporting the engineers for the repair mission. 

The simulator has p = 13 input variables, which represent the scale of the disaster and 
features of the humanitarian relief operation (see Table [^. Eleven of these variables 
are continuous, with the other two being categorical with each having two levels. Input 
variables xi-x^ determine the impact of the earthquake and eruption on the population 
of Giarre and Catania by specifying the capacity of hospitals, shelter and food supply 
immediately following the disaster. The specification of these input variables creates a 
shortfall between population and shelter and/or food supply, leading to casualties. 

The remaining input variables (hve continuous, two categorical) control certain features 
of the humanitarian relief mission. The continuous input variables are self-explanatory 
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Table 1: Input variables for the humanitarian relief mission simulator. The units of measurement 
for helicopter cargo capacity are specihc to this simulator. Note that the initial populations 
in the simulator of Giarre and Catania are 27,000 and 300,000, respectively. Under normal 
circumstances, the simulator only expects 1% of the population per day to require hospital 
treatment. 


Continuous input variables 

Name 

Symbol 

Range 

Units 

Giarre hospital capacity 

Xl 

(135, 270) 

person/day 

Giarre shelter capacity 

X 2 

(13500, 27000) 

person/day 

Giarre food supply capacity 

X3 

(13500, 27000) 

person/day 

Catania hospital capacity 

X4 

(2000, 3000) 

person/day 

Catania shelter capacity 

X5 

(200000, 300000) 

person/day 

Catania food supply capacity 

xe 

(200000, 300000) 

person/day 

Weighting of the engineer toolbox 

Xl 

(0, 1) 

N/A 

Planning time for the humanitarian mission 

Xs 

(36, 60) 

hrs 

Helicopter cruise speed 

Xq 

(220, 270) 

km/hr 

Helicopter cargo capacity 

XlO 

(7000, 7500) 

N/A 

Engineer ground speed 

xn 

(0, 10) 

km/hr 

Categorical input variables 

Name 

Symbol 

Levels 


Recipient of food aid 

X12 

{Giarre & Catania, Catania only} 

Location of NGO base 

Xl 3 

{Continental Europe, Task-force Base} 


with the exception of x^\ the weighting of the engineer toolbox. This variable controls the 
relative importance given to repairing shelter and the food supply by the two engineering 
teams; xy = 0 (xy = 1 ) corresponds to engineers only repairing the shelter (food supply). 

The two levels for categorical variable Xi 2 correspond to, respectively, supplying food aid 
to both Giarre and Catania or to Catania alone. Although the second option is perhaps 
morally and politically unappealing, it may be practically relevant as there can be a 
much greater shortfall between the available and required food in Catania. Simulation 
modelling allows investigation of the impact of potentially unattractive options. For 0 : 13 , 
the two levels correspond to the NGO base being (i) in continental Europe or (ii) part of 
a military task force located on a fleet of ships in in the Strait of Messina between Italy 
and Sicily (see Figure [^. 

Each run of the simulator is dehned by a setting for Xi-Xi^. The output from each 
simulator run is the number of civilian casualties that have occurred on each of days 
2,3,4,5 and 6 following the disaster. Therefore, the output for each run is a five dimensional 
vector. 


1.2 Bayesian emulators 

A Bayesian approach will be taken to constructing an emulator for the DIAMOND sim¬ 
ulator. Let X = [xi, ...,Xp)^ G A C denote the vector of p input variables, with X 
the p-dimensional input space. The simulator is assumed to be a black-box function, 
/ : A —)■ 3 ^ C with y the fc-dimensional output space; that is 

/(x) = {/i(x),...,/fc(x)}^ , 

is the /c X 1 output vector from the simulator at input combination x. An emulator for 
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/(■) is a prediction equation that provides a surrogate for /(xq), where xq is an input 
combination at which the simulator has not previously been evaluated. 

For a collection of input combinations ( = {xi,...,x„}, with Xj = {xa, ...,Xip)"^, the 
simulator outputs are collated into an n x k output matrix 



A priori, we assume that F is a realisation from a probability distribution, specihed up 
to a d X 1 vector of unknown parameters 0 G 0, with 0 C the parameter space. After 
running the simulator for the input combinations in (, the emulator is constructed as the 
posterior predictive distribution (see, for example, O’Hagan and Forster 2004, p. 89) of 
yo = /(xo), given by 


7r(yo|y)= / 7r(yo|0,y)7r(0|y)d0. (1) 

Je 

Here, 7t{6\Y) is the posterior density function for 6, found using Bayes theorem, and 
7r(yo|0,H) is the conditional posterior predictive density for yo. 

In the remainder of this article, methodology for multivariate Bayesian emulation is de¬ 
veloped and applied. In Section the detailed methodology used to obtain the posterior 
predictive distribution is described for both multivariate Gaussian processes and linear 
models. In Section model selection and diagnostics for multivariate emulators are de¬ 
veloped and discussed. In Section results are presented from applying the methodology 
to emulating the DIAMOND simulator. Section gives a brief discussion. 

Code to £t the emulators described in this paper and the training and test datasets are 
provided as supplementary material. 


2 Multivariate emulation via the posterior predictive distribu¬ 
tion 


In this section, the posterior predictive distribution is derived for a general class of mul¬ 
tivariate linear models that includes Gaussian process (GP) models and linear regres¬ 
sion models. As such, the multivariate GP emulator of Conti and O’Hagan ( 2010| and 
lightweight emulator of Rougier (2007) are special cases. We also demonstrate how the 


multivariate GP emulator can include categorical input variables using the distance met¬ 


rics of Qian et ah (2008). 


Our basic modelling assumption is that any hnite set of multivariate responses has a 


joint matrix normal distribution (Dawid, 1981) with mean function a linear combination 


of unknown model parameters and a separable covariance structure with, potentially, 
correlations between outputs from the same run and also between different runs of the 
simulator. That is, for n x k response matrix Y 


F|R,S,A~MN„,fc(i7R,E,A) , (2) 

where HB is the n x k mean matrix and E and A are, respectively, k x k and n x n 
positive dehnite column and row scale matrices. Note that 


5 





















vec{Y)\B, S, v4 ~ N„fc {vec{HB), S (g) A) 


is a multivariate normal distribution, where vec(-) denotes the vectorisation function that 
stacks columns of a matrix and 0 denotes the Kronecker product. 

In (§, the matrix H is the n x m model matrix with ith row given by h(xj)^, where 
h : df —)■ "H C M™' is a known function of the simulator inputs (i = 1,..., n). For example, 
if h(x) = (l,xi), then the model contains an intercept and a linear term in Xi. If some 
input variables are categorical, then we dehne the appropriate elements of h(xj) through 
the usual constraints, for example corner-point or sum-to-zero. The matrix i? is an m x A; 
matrix of unknown regression parameters. 


The separability of the covariance structure implied by this matrix normal distribution 
results in a common scale matrix S for the k multivariate responses at each of the n 
simulator runs. An emulator with a separable covariance structure is both easier to 
implement and interpret. If diagnostic measures (see Section 3.1) suggest inadequacy of 
the separable emulator, alternative methodologies could be employed (see, for example, 
Fricker et al., 2013, and references therein). 


If homogeneity of variance across the simulator runs is assumed, that is Var {/(xj)} = S 
for alH = 1,..., n, then A can be specihed as a correlation matrix. For the multivariate 
GP emulator, we dehne A through a stationary correlation function, and set fjth entry 
equal to aij = c(|xj — Xj|; r), i.e. the correlation between any two rows of Y depends only 
on the distance between x* and Xj {i,j = 1,... ,n) and a vector of unknown correlation 
parameters r. The lightweight emulator is dehned as a special case with 


c(xi,Xj;r) 


1 if i = j , 

0 if otherwise. 


Thus we can replace conditioning on A in (|^ by conditioning on r. 

We use the conditionally conjugate (given r) matrix-normal-inverse-Wishart (MNIW) 
prior distribution for B and S, denoted MNIW^^a: (Tf, G, S, 5), where 


S|S,r ~ MN^,fe(M,S,G), 
S|r ~ mk{S,6). 


( 3 ) 

( 4 ) 


Here, IW^ denotes the inverse-Wishart distribution for k x k positive-dehnite matrices, 
M, G and S are the m x k, m x m and k x k matrices of hyperparameters, respectively, 
and 5 > 0 is the prior degrees of freedom. The corresponding probability density function 
is given in Section 1 of the Supplementary Material, up to a normalising constant; see 
also Rougier (2007). 

Using this prior distribution the conditional posterior distribution, given r, is 


R, S|y, r ~ MNIW^,fc (m, G, , 

see Section 2 of the Supplementary Material, where 
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M = n {H^A-^Y + n-^M) , 

S = Y^A-^Y + M^n-^M + S - M^Cl-^M, 
S = 6 + n. 


To predict the simulator output Yq = [/(xqi), ...,/(xo„o)]'^ at a set of Uq test inputs, 
Co = {xoi, ...,xono}, we first define the joint conditional distribution of Y and Yq 


Y 

Yq 


5, S, r ~ MN- 


n-\-no,k 


H 

Ho 




A T 
An 


( 5 ) 


where Hq is the hq x m matrix with nth row h(xou)^, Aq is the uq x no matrix with unth 
element given by c(xom, Xot,; r), and T is the n x uq matrix with iuth element given by 
c(xi, xo„; r) (u, n = 1,..., no; i = 1,..., n). 

It can be shown (see Section 3 of the Supplementary Material) that the conditional dis¬ 
tribution of To is 


Yq\Y, 5, E, r ~ {HqB + T^A-\Y - HB), E, Ao - T^A-'T) . (6) 

From (|^ and ([^, we can see the fundamental difference between the GP and lightweight 
emulators; for the lightweight emulator, the output from different simulator runs is as¬ 
sumed independent given {i?,E} and hence the matrix, T, of correlations between the 
observed and unobserved simulator runs will be a zero matrix. Hence, conditional on B 
and E, the distribution of Yq does not depend on Y . For the GP emulator, with non-zero 
correlations between simulator runs, the dependence between Yq and Y remains even after 
conditioning on B and E. 

To obtain the posterior predictive distribution of Yq, given r, we integrate (|^ with respect 
to the posterior distribution of B and E (see Section 4 of the Supplementary Material): 

yo|T,r~MT„„,fc(Q,^,i?,5) , (7) 


where 


Q = HqM + T^A-^ {y - HM^ , 

R = Aq-T^A-^T + {HQ-T^A-^H)Cl{HQ-T^A-^Hy , 


and MTnQ,k{Q, S,R,S) denotes the matrix t-distribution (Javier and Gupta, 1985) with 
location matrix Q, column scale matrix S, row scale matrix R and degrees of freedom 5. 
Marginal posterior predictive distributions for the nth simulator run, yo^ = /(xq^i), and 
the sth output, uq^us = fsi'^ou) are multivariate and univariate t distributions, respectively: 


yo«|H, r ~ tk 
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yQ^us\y,r ~ t 


( 8 ) 



Here, is the wth row of Q and qus is the nsth element of Q, Ruu is the nth diagonal 
element of R and Sss is the sth diagonal element of S. 

For the lightweight emnlator, where H = /„, an n x n identity matrix, ([^ provides closed- 
form posterior predictive distribntions. For the mnltivariate GP emnlator, and the most 
commonly nsed correlation fnnctions c(-, ■; r), there does not exist a prior distribntion for 
r snch that a closed-form expression can be obtained for the marginal posterior predictive 
distribntion of Yq. Typically, one of two approaches is taken: (i) r is replaced by a “plng- 
in” estimate f, a representative valne with respect to the marginal posterior distribntion 
of r; or (ii) Markov Chain Monte Carlo (MCMC) methods are nsed to sample from the 
marginal posterior distribntion of r and then for each sampled valne of r, a value is drawn 
from the conditional posterior predictive distribution ([^. 

The plug-in approach is less computationally expensive than the fully Bayesian approach 
and provides a closed-form emulator. We adopt the plug-in approach for prediction using 
the marginal posterior mode of r, obtained by maximising the unnormalised marginal 
posterior density 


7r(r|F) oc TTrWIHr^|f]|21^|-^ 


where 7rr(r) is the prior probability density function for 


r. 


The hnal step in building the multivariate GP emulator is choice of the correlation function 
c(-, ■; r). The most commonly used function is the power exponential function, which was 


extended by Qian et al. (2008) to incorporate both quantitative and qualitative variables. 


Assuming without loss of generality that the variables are ordered, so that the hrst pi 
variables in x are quantitative and the next p — pi are qualitative variables, a correlation 
function that is exchangeable in the levels of the qualitative variables has the form 


f Pi 

c(xi,X 2 ;r) = exp - X 2 i\ 


i=i 


- ^ ril{xii 7 ^ X2i) 

i=Pi+i 


(9) 


) suggested a number of correlation functions for qualitative variables, 
the common form ([^ for two-level qualitative variables. Throughout 
this paper, we fix gi = 2 for all 1. 


Qian et al. (2008 


each reducing to 


3 Emulator diagnostics and improvement 

In this section, we address diagnostics for emulator adequacy and methods for improving 
emulator performance, including variable selection and the addition of a nugget term for 
the multivariate Gaussian process. 


3.1 Emulator diagnostics 


We start by developing generalisations to multivariate emulators of the diagnostics pro¬ 
vided by Bastos and O’Hagan (2009) for univariate Gaussian process emulators. These 













diagnostics assess the assumption underlying (|^, that the responses conditionally follow 
a matrix normal distribution with specihed mean and correlation functions. Their evalu¬ 
ation requires an additional validation set of simulator runs, Co and Iq; to be available. 


3.1.1 Individual prediction errors 


As suggested by Bastos and O’Hagan (2009), standardised prediction errors can be ex¬ 
plored graphically or used to construct nominal-level predictive probability intervals. If 
the emulator is an adequate model of the simulator, from ([^, the standardised individual 
prediction error 


DliYt,) 



iyO,us (Ins') 


has a standard t-distribution, conditional on Y with S degrees of freedom (n = 1,..., no; s = 
1,... ,k). A large number of outlying standardised prediction errors, with respect to the 


reference distribution, indicates serious inadequacy of the emulator. Bastos and O’Hagan 


(2009) suggested various graphical methods for identifying patterns in outliers and, sub¬ 
sequently, causes for emulator inadequacy; for example, plots of the individual prediction 
errors against each input variable or the predictive mean. 


Individual (1 — a) x 100% predictive probability intervals for each element of Yq can be 
constructed as 


Qus 



where Cq, is the (1 — a/2)th quantile of the standard t-distribution with S degrees of 
freedom. The obtained coverage of these intervals can be compared against 1 — a, with 
low coverage suggesting the emulator is underestimating the prediction uncertainty. 


3.1.2 Omnibus diagnostic 


We now develop a summary statistic for overall emulator adequacy, analogous to the 
Mahalanobis distance diagnostic of Bastos and O’Hagan (2009). Dehne E as the no x k 
matrix of standardised predictions 


E = G],^ {Yo-Q)G-s\ 


, for an adequate 

emulator, the conditional posterior distribution of E is 

E\Y,r MTno,k (0noxk,h,InoJ'^ ■ 

We now dehne the diagnostic 


where R = GrG]^ 


and 5 = 


Following Javier and Gupta (1985 


U = \Ik + E^E\-^ 


( 10 ) 
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with extreme (large or small) values of U, relative to the reference distribution, indicat¬ 
ing emulator inadequacy. Following [Dickey (1967 , the reference distribution for t/ is a 

307) showed that 


Anderson 


(2003 


P- 


^knok+s-i distribution (conditional on Y and r). 
the fc+ 5 _i distribution has the same distribution as a product of k independent Beta 
random variables, i.e. 


k 

~ '^k,no,k+S-ly 

s=l 

where Xg ~ Beta ^(/c -|- (5 — s)/2, no/2^. Summaries of this distribution can be calculated 
by simulation. 

The matrices Gn and Gs are not unique and depend on the chosen decomposition of R 
and S, respectively; for example, the eigen or Cholesky decomposition. However, 


U = \h + iG-syiYo-QfR-^iYo-Q)G-s^\-^ 

= \h + S-^{Yo-QfR-^iYo-Q)\-\ 

and therefore the value of the diagnostic U is invariant to the choice of decomposition. 
Assuming (|^, also note that 


cov (vec(i7)) = 


S-2 


‘fcno 1 


and hence the elements of 5^E form an uncorrelated sample from the t-distribution with 
S degrees of freedom. Quantile-quantile (QQ) plots of these elements can be used as 
an additional check on emulator adequacy. The elements of E are dependent on the 
decomposition used to obtain Gr and Gs- However as noted by [Bastos and O’Hagan 


(2009), any choice of decomposition method is appropriate for use in a QQ-plot, and we 


use the Cholesky decomposition. 

For univariate simulator output (fc = 1), the omnibus statistic reverts to the Mahalanobis 
distance suggested by Bastos and O’Hagan (2009). Now, is an uq x 1 vector following 
a t„o(0, (l/(5)/no,^) distribution, E'^E is scalar and 1 — H ~ Beta(no/2, h/2) with 


UqU 


6 rp 

= —E^E 

no 


F ho,h 


The quantity E'^E/{S — 2) is the Mahalanobis distance and F{a,b) denotes an F distri¬ 
bution with a and b degrees of freedom. 

3.2 Emulator improvement 


The diagnostics in Section 3.1| can be used to suggest improvements to a multivariate 
emulator. For example, graphical assessment of standardised errors may suggest different 
mean functions h(x), transformations of inputs, or regions of X where new simulator 
runs should be performed; see Bastos and O’Hagan (2009). We focus on selection of 


an appropriate mean function and improvement of GP emulators via the addition of a 
nugget. 
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3.2.1 Mean function selection via model comparison 


It is common in the application of GP emulators to usually assume a simple form for the 
mean function such as h(x) = 1 or h(x) = c(l,x) (see, for example, Bayarri et ah, 2007). 
Clearly, for the lightweight emulator, with uncorrelated errors, such a simple assumption 
will usually be inappropriate. We demonstrate in Section that using an overly complex 
mean function (i.e. overfitting) can also be detrimental to the accuracy of the emulator 
on an independent test data set, as with the more usual applications of the linear model. 
This motivates the use of Bayesian model comparison as a vehicle for the selection of an 
appropriate mean function. 

Let each unique choice of h(x) be indexed by v, i.e. we label mean functions as h.^(x), 
with V E V and V denoting the set of possible models. Then, following equations ([^ 
and ([^, 


Bayarri et al., 2007 


Y\B^, T,y,V, Yy ~ {HyBy, T,y, Ay) , 


Yo\Y,v,ry r^MTno,k(Qv,Sy,Ry,6y'^ , (11) 

where 

Qy = Hy,oMy+TjA-\Y -HyMy), 

Ry = Ay,0-TjA-^Ty + {Hy,0-TjA-^Hy)ny{Hy,0-TjA-^Hy)^ , 

ny = {HjA-^Hy + n-Y\ 

My = ny{HjA-^Y + n-^My) , 

Sy = Y^A-^Y + Mjn-^My + Sy-Mjn-^My, 

Sy - Sy T* U , 


My, Qy, Sy anO Sy are hyperparameters for the uth model, holds the correlation pa¬ 
rameters for the uth model, and Hy Q, Hy, Ay, Ay^Q, Ty and By for model v are analogous 
to matrices defined in Section |2l 

A fully Bayesian approach would average ( pT| with respect to the posterior distribu¬ 
tion of the correlation parameters, r^, and the posterior model probabilities to provide 
a model-averaged posterior predictive distribution. Alternatively, Bayesian model com¬ 
parison can be used to identify a model v, based on the posterior model probabilities, 
and Yo\Y,ry,v can be employed as an emulator. The obvious choice for v is the posterior 
modal model with highest posterior model probability. We adopt this latter approach, 
both for computational convenience and also to provide interpretable emulators that aid 
scientific understanding of the simulator. 

The posterior model probability for model v is given by 


7i{v\Y) = 


7i{v) f 7r(y |ri,, u)7r(ri,|u)dr 


E^ev u)7r(r^|u)dr„ ’ 

where 7r(v) is the prior model probability of v such that Eijsv ^('^) ~ 
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p I fc+i5i,-l 

J- fc I 2 


pnA:/2r^ |^^|fc/2 |(]^|fc/2 |^^|(5„+fc_l)/2 ’ 

and rfc(-) is the multivariate gamma function (Javier and Gupta, 1985) 


k 

rfc(x) = JJ r (x - (s - l)/2) . 

S = 1 


The term J 7 r(y'|r^, n) 7 r(r^,|n)dr„ which features in the posterior model probability is 
known as the marginal likelihood. For the GP emulator, the integration required to 
evaluate the marginal likelihood will not be analytically tractable. For the lightweight 
emulator, where = In and does not depend on r^, the marginal likelihood is available in 
closed form. However, if the number of models, |V|, is large then calculating the marginal 
likelihood for every model will be computationally infeasible. Instead we generate a sam¬ 
ple from the posterior distribution of the model index, v, using MGMG methods. For a 
GP emulator, each iteration of the MGMG method has two phases. 


Phase 1 uses the MGMG model composition algorithm (Raftery et ah, 1997) to update 
the model index conditional on the current value of the correlation parameters. Suppose 
the current model is v and a move to a model w is proposed with probability p(n, w) 
where the correlation parameters remain unchanged, i.e. = r.„. The move is accepted 
with probability 


_ 7r(y|r^,w)7r(w) p{w,v) 
n{Y\ry,v)7r{v) p{v,w) ' 

Phase 2 updates the correlation parameters, r^, conditional on the current model v using 
a suitable MGMG method. We employ a random walk Metropolis-Hastings algorithm. 

For the lightweight emulator, phase 2 is not required. After a large number of iterations, 
when the chain has reached a stationary distribution, the proportion of iterations that 
visit model v provides an approximation to 7 r(n|y). We choose p{v,w) such that (i) 
proposed models can only add or remove a single term from the current model, adhering 
to marginality, and (ii) all possible models that obey these conditions are equally likely 
to be proposed. 


3.2.2 Non-zero nugget 


Gramacy and Lee ( 2012 ) discussed improving the adequacy of univariate GP emulators 


via the inclusion of a non-zero nugget parameter, principally to mitigate the effects of 
incorrect model assumptions. Use of a nugget changes the (z, j)th element of A, 




= c{xi,Xj;r)+pI{i = j), 


where p > 0 is the nugget parameter and I{i = j) is the indicator function. For predic¬ 
tion, we again adopt a plug-in approach for the nugget parameter, and replace z; by a 
representative value 17 (the posterior mode). For model selection, the value of the nugget 
is sampled in phase 2 of the MGMG algorithm. The prior for p used in this paper is 
given by irlp) = (1 J- p‘^)~^, previously used by Gonti and O’Hagan (2010) for correlation 
parameters. 
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4 Application to the DIAMOND simulator 


In this section, the methodology from Sections and is employed to construct and check 
multivariate GP and lightweight emulators for the DIAMOND simulator. Recall that the 
scenario under investigation has been solely designed for model testing purposes. Hence, 
when, for example, we refer to the importance of specihc input variables, we do so only 
in that context. In particular, we do not intend these observations to be applied to other 
situations. For the construction of each emulator, we scale the continuous input variables 
to [0,1] and denote the levels of the categorical variables as {0,1}. 


4.1 Prior information 


When constructing individual GP and lightweight emulators, we assume weak prior in¬ 
formation for the model parameters B, S and r, following Gonti and O’Hagan (2010): 


M 


Omxfc ) 
Omxm ^ 


S Ofcxfc ) 

5 = —k -\-1 . 


The correlation parameters r are assumed independent, with prior distributions 
using the approach of Linkletter et al. (2006). We rewrite c(xi,X 2 ;r), from 


specihed 

as 


pi p 

c(x.,X2;r) = nPp-”'" n 

/=! /=pi+l 

where pi = exp(—n) G (0,1) for r; > 0 (/ = l,...,p). We assume a uniform prior 
distribution for pi, leading to the induced prior for ri being an exponential distribution 
with E{ri) = 1. 

When performing model comparison for the selection of the mean function with only 
weak prior information available for the parameters of each model, we adopt prior hyper¬ 
parameters = Ofcxfc and 6^ = —k + 1 for which is present in all models, and unit 
information prior distributions for By, with My = Opxp and 


ny = n{HjA-^Hy) 


-1 


as proposed by Kass and Wassermanj (|1995[). The use of proper prior distributions for 


By avoids Findley’s paradox (see Bernardo and Smith, 1994, pg 394) which states that 


the posterior model probabilities are sensitive to the scale of the prior variance (see also 
O’Hagan and Forster, 2004, pp. 322-324, Raftery et ah, 1997 and Fernandez et al., [2001 ). 
We assume the same exponential prior, see above, for each element of for each model, 
i.e. 7r(r^|n) = 7r(r^). A uniform prior over the model space is chosen, i.e. 7r(n) = |V|“^, 
where V is the set of all sub-models of the maximal model that respect marginality. 
The maximal model has a mean function consisting of the intercept, all linear, two-way 
interaction and, for the continuous inputs, quadratic terms. The resulting model matrix, 
H, has m = 103 columns. 
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For this weak prior information, a from (12) reduces to 


a = 


I P{W,V) 

■ ^ |5JV2p(n,H’ 


where 





n 

n + 1 




-1 



Y. 


4.2 Design of the computer experiment 


We employed a space-hlling design that would enable the estimation of both the Gaussian 
process and lightweight emulators. The most common design used for computer experi¬ 
ments is the Latin Hypercube (McKay et al., 1979) and its extensions (see, for example. 


Tang, 1993, and Morris and Mitchell, 1995). Such designs provide low-dimensional uni¬ 


formity in the input variables, hence achieving good projection properties, and allow the 
estimation of nonparametric regression models. They are also an attractive choice for 
lightweight emulation, as the exact form of the emulator will be unknown in advance of 
the data collection and a flexible design that allows the htting of many different parametric 
models may be required (see Section 3.2). 


The design, C, = {xi,...,x„}, for this study needed to combine both continuous and 


categorical input variables. We used a sliced space-hlling design as proposed by |Qian and 
Wu (2009) with n = 120 runs. Such a design, constructed from an orthogonal array, has 


not only good space-hlling properties overall but also for the projection into the continuous 
variables for each combination of values of the categorical input variables. 


4.3 Construction of adequate emulators 

We constructed both lightweight and multivariate GP emulators for the DIAMOND sim¬ 
ulator using the n = 120 simulator runs, each outputting k = 5 responses, from the sliced 
space-hlling design as training data. For model validation and diagnostics, we use a sec¬ 
ond design (q = {xqi, ...,xon(,}, with associated hq x k simulator output matrix Yq. This 
design is also a sliced space-hlling design with hq = 120 runs and was constructed using 
a diherent orthogonal array to that used to construct (. 

We chose, assessed and compared emulators using the diagnostics from Section We 
calculated the root mean squared error (RMSE) for Yq, 


RMSE 


1 

nok 


riQ k 

(^o,«s— 

U=1 S=1 



where To,ns is the simulator output from the uth validation run for response s. We also 
calculated the root relative mean squared error (RRMSE), 


RRMSE 


-l tiq k /T/" 

EE ^ 0,us 'lus ) 


n 1/2 


nok 


U=1 S=1 


y2 
^ 0,ns 
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Table 2: Observed values (to 3 decimal places) of the omnibus diagnostic U, coverage of the 
95% predictive probability intervals, RMSE and RRMSE for the various emulators considered. 
The reference distribntion for U has expected value of 0.030, and 2.5% and 97.5% quantiles of 
0.019 and 0.044, respectively 


Emulator 

Mean function 

Nugget 

U 

Goverage 

RMSE 

RRMSE 

Lightweight 

Maximal 

NA 

0.000 

0.478 

2728.791 

6.975 


Modal 

NA 

0.025 

0.953 

988.729 

0.528 

Multivariate GP 

Intercept 

Zero 

0.001 

0.958 

415.030 

0.457 


Linear 

Zero 

0.015 

0.965 

344.234 

0.397 


Modal 

Zero 

0.012 

0.958 

341.859 

0.396 


Maximal 

Zero 

0.000 

0.477 

2701.149 

6.791 

Multivariate GP 

Intercept 

Non-zero 

0.033 

0.975 

363.014 

0.387 


Linear 

Non-zero 

0.019 

0.948 

1264.094 

0.539 


Modal 

Non-zero 

0.034 

0.963 

334.597 

0.403 


Maximal 

Non-zero 

0.000 

0.478 

2728.383 

6.973 


where the point estimate 7 ^^ = E (Yq r) /E (Eq r) minimises the relative squared 
error loss function. 


4.3.1 Lightweight emulators 


Our hrst lightweight emulator was the maximal model. The value of the omnibus test 
statistic, U, and coverage of the 95% predictive probability intervals are given in Table 
Note that the reference distribution for U has expected value of 0.030, and 2.5% and 97.5% 
quantiles of 0.019 and 0.044, respectively. The diagnostics indicate there is a discrepancy 
between the simulator and this emulator, with the observed value of U and the achieved 
coverage both being low. Further evidence of this discrepancy is the QQ-plot of the 
uncorrelated errors against a reference t-distribution. Figure [^a); the points form a line 
with slope greater than one, indicating that the variance associated with the emulator 
predictions has been underestimated. 


To attempt to alleviate the obvious inadequacy of this emulator, alternative mean func¬ 
tions h(x) were compared using Bayesian model comparison (Section 3.2). The posterior 
modal model was found from 10^ iterations of the MCMC algorithm (discarding the first 
10% of iterations as burn in). The algorithm took 2.5 minutes on a computer with a 
3.20Ghz processor and 8 Gb RAM, and the average acceptance rate for the proposed 
moves in Phase 1 was 4.7%, reflecting the concentration of the posterior model proba¬ 
bilities on a small number of models. Table displays the terms in the posterior modal 
model, and gives the associated posterior marginal inclusion probabilities (i.e. the propor¬ 
tion of visited models that included that term). The model matrix, H, for the posterior 
modal model has m = 11 columns. The value of U and the coverage for the emulator 
with this alternative mean function are shown in Table These values suggest there is 
no evidence of a discrepancy between the simulator and the emulator. This conclusion is 
supported by the QQ-plot of the uncorrelated errors in Figurej^b). Also shown in Table 
are the RMSE and the RRMSE of the maximal and modal model emulators. Note how 
the simpler form of emulator has smaller values for RMSE and RRMSE, indicating the 
modal model has significantly improved predictive accuracy. 
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(a) Maximal model 


(b) Modal model 




Theoretical quantiles Theoretical quantiles 

Figure 2: QQ-plots of the uncorrelated errors against a reference t-distribution for lightweight 
emulators: (a) the maximal model; (b) the modal model 


Table 3: Marginal posterior probabilities (up to 3 decimal places) of the terms in the modal 
mean functions 


Terms 



Lightweight 

GP 

(zero 

nugget) 

GP 

(non-zero 

nugget) 

Linear Effects 

Food capacity (Giarre) 


X3 

0.999 

1.000 

1.000 

Food capacity (Catania) 


Xq 

1.000 

1.000 

1.000 

Planning time 


Xs 

0.970 

1.000 

1.000 

Recipient of food aid 


Xl2 

1.000 

- 

- 

Location of NGO base 


Xl3 

1.000 

1.000 

1.000 

Quadratic Effects 

Planning time 



0.764 

0.999 

1.000 

Interactions 

Food capacity (Giarre) x 

Recipient of food aid 


0.811 

- 

- 

Food capacity (Catania) 

X Recipient of food aid 


1.000 

- 

- 

Food capacity (Catania) 

X Location of NGO base 


1.000 

0.983 

0.828 

Recipient of food aid x Location of NGO base 


0.914 

- 

- 


4.3.2 Multivariate Gaussian process emulators 


We construct GP emulators with four different forms for the mean function, h(x): (i) 
intercept only (m = 1); (ii) linear terms only (m = 8); (iii) the modal model found 
by the model comparison procedure (m = 7; see Table [^; and (iv) the maximal model 
[m = 103). We initially £x the nugget at zero. As a comparison with Section 4.3.1, the 
model comparison procedure took 30 minutes and had an acceptance rate of 0.5%. 


Table shows the values of U and the coverage for these four GP emulators. Figure |^a- 
d) show QQ-plots of the uncorrelated errors for these emulators. Glearly, the values in 
Table and the QQ-plots show that there exist serious discrepancies between all four 
emulators and the simulator. Similar to the maximal lightweight emulator, the QQ-plot 
shows that the variances associated with the GP emulator predictions are underestimated. 


To remedy these inadequacies, we included a non-zero nugget in emulators using all four 
forms of the mean function. The model comparison algorithm took 30 minutes and had 
an acceptance rate of 1.6%. The modal mean function for both types of GP emulator 
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(a) Intercept model (zero nugget) 


(b) Linear model (zero nugget) 


(c) Modal model (zero nugget) 


(d) Maximal model (zero nugget) 


• 

m 

Sample quantiles 

f f f ? f 

• • 

•0 

• 

Sample quantiles 

f f f f ? f 

• • 

• 

• 

Is- 

Q. — 

i o 

o 

• 

• 

-3-2-10 1 2 3 


-3-2-10 1 2 3 


-3-2-10 1 2 3 

' 

^— I — I—— 1 — 1 — 

-3-2-10 1 2 3 

Theoretical quantiles 
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Theoretical quantiles 

)e) Intercept model (non-zero nugget 


(f) Linear model (non-zero nugget) 


(g) Modal model (non-zero nugget) 


[h) Maximal model (non-zero nugget) 



Figure 3: QQ-plots of the uncorrelated errors against a reference t-distribution for the zero 
nugget GP emulator with (a) the intercept model, (b) the linear model, (c) the modal model, 
(d) the maximal model, and for the non-zero nugget GP emulator with (e) the intercept model, 
(f) the linear model, (g) the modal model, (h) the maximal model 

(with and without nugget) are identical (see Table |^. The values of U and the coverage 
for the four non-zero nugget GP emulators are also shown in Table The corresponding 
QQ-plots are shown in Figure [^e-h). There still exist discrepancies between the emulator 
and simulator for the maximal and linear forms of the mean function. However, for the 
intercept and modal forms, the values in Table and the QQ-plots provide no evidence of 
inadequacy, with the diagnostics being highly plausible under their reference distributions. 
The values of RMSE and RRMSE for all eight GP emulators are also given in Table 
Note the high values of these errors under the maximal models. The intercept and modal 
GP emulators (with non-zero nugget) have signihcantly higher predictive accuracy than 
the lightweight emulators. There appears to be little difference between the intercept and 
modal model for the GP emulators (with non-zero nugget) in terms of predictive accuracy. 


4.4 Sensitivity analysis 


An important application of statistical emulators are sensitivity analyses to identify im¬ 
portant input variables and their impact on the responses. For the lightweight emulator. 


the model comparison algorithm in Section 3.2.1 has the advantage of automatically iden¬ 


tifying the most important input variables. When product terms are included in the mean 
function, it can also identify important interactions. For the DIAMOND simulator, there 
are interactions between the food capacity at Gatania and both the location of the NGO 
base and the recipient of the food aid. There are also interactions between the food ca¬ 
pacity at Giarre and the recipient of food aid and location of NGO base and recipient of 
food aid. There is evidence that planning time has a non-linear effect. 

For the multivariate GP emulator, input variables can impact the response through both 
the mean function and correlation structure. Hence, the model selection algorithm in 
Section |3.2.1| may not identify all the important variables. For an intercept-only GP, the 


relative importance of the input variables is only determined by the relative magnitude 
of the corresponding correlation parameters, r. In general, the output is more sensitive 
to those input variables with large correlation parameters. As calibrating the size of 


correlation parameters can be difficult, Linkletter et al. (2006) proposed a more formal 


variable selection method for univariate GPs, reference distribution variable selection 
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(a) Continuous input variabies 




Food capacity (Giarre) 
Food capacity (Catania) 
Planning time 
Helicopter speed 
Others 
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(b) Categoricai input variabies 



Figure 4; Histograms of the null reference distributions from the RDVS method for the corre¬ 
lation parameters of the (a) continuous and (b) categorical inert input variables. The posterior 
medians of the input variables are shown as vertical lines 


(RDVS). Values of an inert input variable, x*, are randomly generated from the input 
space V. An MCMC sample is generated from the marginal posterior distribution of 
r and r*, where r* is the correlation parameter of the inert input variable. The above 
procedure is repeated B times with different randomly generated values of inert input 
variables. The posterior median of each element of r, approximated from the union of the 
MCMC samples from all randomly generated sets of inert input variables, is compared to 
the null reference distribution of the posterior medians of r* (obtained from the B sets of 
values for the inert input variable). For more details see Linkletter et ah (2006). 


Application of RDVS to multivariate GP emulators is straightforward. Our simulator 
has both continuous and categorical input variables, and hence we adapt RDVS by at 
each iteration randomly generating values for two inert input variables, xl and where 
xl G [0,1] and X 2 G {0,1}, with {0,1} indicating the two levels for a categorical variable. 
The posterior median of the elements of r corresponding to continuous input variables 
is then compared to the null reference distribution of the posterior medians of rl, and 
similarly for the categorical input variables and r^. 


We applied RDVS with the GP emulator (with non-zero nugget and the intercept mean 
function), using B = 1000. Figure displays the null reference distributions for the 
correlation parameters (on the log scale) of the (a) continuous and (b) categorical inert 
input variables, i.e. the 1000 posterior medians of the correlation parameters, rl and 
rl, from the MCMC samples. Also indicated in Figure are the posterior medians of 
the actual input variables as vertical lines. Clearly the most important continuous input 
variables are the food capacities at both Giarre and Catania, planning time and helicopter 
speed. Both of the categorical input variables are deemed to be important. This agrees 
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with the conclusions from the modal lightweight emulator, except for the inclusion of 
helicopter speed. 


RDVS with a GP emulator having mean function including only an intercept is unable 
to explicitly identify interactions. A probabilistic sensitivity analysis (see, for example, 
Santner et ah, 2003, ch. 7) can be used to understand and visualise the functional form 


of the individual and joint effects of the variables. 


The variation in the simulator output induced by variation in the input variables can be 
decomposed into main effects and interactions. Assume interest is in the total number of 
casualties across days two to six of the disaster, given by gix.) = Letting E 

denote expectation with respect to an assumed joint distribution for the input variables 
X, we can then define the following main effects and first-order interactions: 


gi{xi) = E[g{x)\xi\ - go, 
gij{xi,Xj) = E [g{x.)\xi,Xj] - go- gi{xi) - gj{xj) , 


(13) 

(14) 


where go = E [^'(x)]. Corresponding partial variances are given by 


17 = E [g,{x,f] , 

Vij = E[gij{xi,Xjf] , i,j = l,...,p. 


Following Oakley and O’Hagan (2004), these variances can be estimated by their ex¬ 
pectation, denoted E*, with respect to the posterior predictive distribution of (/(x), a 
non-standard t distribution; see Section 5 of the Supplementary Material. Hence, the 
following estimated sensitivity indices can be defined: 


Si = E*{Vi)/E*{V) , (hrst-order) 

Sij = E*{Vij)/E*iy) , (second-order) 


where V = Var [(^(x)] with respect to the distribution of the input variables. Explicit 
formulae for E*(l/), E*(Vi) and E*{Vij) can be derived in terms of the expectation with 
respect to the distribution of the input variables, and are given in Section 6 of the Sup¬ 
plementary Material. 


We assume that the input variables are independent, that the continuous variables are 
uniformly distributed over their corresponding ranges and the categorical input variables 
have probability 0.5 on each of their two levels. We compute the estimated sensitiv¬ 
ity indices under both the multivariate GP emulator (intercept mean function and non¬ 
zero nugget) and, for comparison, the lightweight emulator (modal mean function). For 
the lightweight emulator, the estimated sensitivity indices are available in closed-form 
(Rougier, 2007|[ ) and can only be non-zero for those main effects and interactions featur¬ 
ing in the, selected, modal model. Under the multivariate GP emulator, the expectations 
with respect to the distribution of the input variables require approximation, achieved 
here using Monte Garlo integration. 


Table 1^ shows the estimated sensitivity indices under both emulators. For the multivariate 
GP, we present first-order estimated sensitivity indices for each of the variables identified 
by the RDVS method. We also present the seven largest second-order sensitivity indices; 
four of the corresponding interactions were selected in the modal lightweight emulator. 
The dominance of input variable xq, controlling the food capacity at Gatania, is clear; 
variation in xq induces nearly 90% of the total output variation for both emulators. How¬ 
ever, this input variable, in common with xi-x^ is essentially a noise variable and clearly 
could not be controlled in a real disaster. Hence, of particular interest are the interactions 
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Catania food supply capacity = 1 



-Planning time Recipient of food aid Others 

- Helicopter speed - Location of NGO base 


Figure 5: Plots 
(intercept mean 
at Catania (xq). 


of expected conditional main effects (15) from the multivariate 
function and non-zero nugget) for four different settings for the 


GP emulator 
food capacity 
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Table 4: Estimated first- and second-order sensitivity indices (multiplied by 1000 and displayed 
up to 3 decimal places) of the input variables under the lightweight and multivariate Gaussian 
process emulators. 


Terms 


Lightweight 

Multivariate GP 

First-order 

Food capacity (Giarre) 

X3 

9.978 

7.854 

Food capacity (Gatania) 

Xg 

887.818 

895.176 

Planning time 

X8 

2.589 

1.881 

Helicopter speed 

Xg 

0.000 

0.312 

Recipient of food aid 

Xl2 

2.566 

2.264 

Location of NGO base 

Xl3 

63.739 

64.067 

Sum of Others 


0.000 

0.023 

Second-order 

Food capacity (Giarre) x Recipient of food aid 


1.184 

0.474 

Food capacity (Giarre) x Location of NGO base 


0.000 

0.365 

Food capacity (Gatania) x Recipient of food aid 


1.620 

2.121 

Food capacity (Gatania) x Location of NGO base 


3.599 

6.750 

Planning time x Location of NGO base 


0.000 

0.572 

Planning time x Food capacity (Gatania) 


0.000 

0.173 

Recipient of food aid x Location of NGO base 


1.099 

0.906 

Sum of Others 


0.000 

0.178 


between xg and the control variables X 7 -X 13 . To graphically investigate these effects for 
the GP emulator, in Figure we display the expected conditional main effects 

E* {E [g{x.)\xi,xe = 1] - go} , (15) 

for i = 8,9, 12,13 (as identified by RDVS) and I = 0,1/3, 2/3,1. For xq 7 ^ 1, there are 
strong negative conditional effects for both categorical variables Xu and X 13 , with lower 
casualties resulting from providing food aid only to Catania and, especially, locating 
the NGO base with the task force. However, for xq = 1, variable X 13 no longer has a 
substantive effect and X 12 now has a positive effect (lower casualties result from providing 
food aid to both cities). Planning time (xs) always has a positive effect, although the 
degree of nonlinearity changes with the value of xg 


5 Discussion 

Statistical emulation of multivariate simulators is an important problem in a number 
of application areas and presents challenging methodological issues. We have presented 
a unified Bayesian approach to the construction of both parametric (lightweight, lin¬ 
ear model) and nonparametric (Gaussian process) emulators, including model selection, 
diagnostics and sensitivity analyses. Our application, emulating a humanitarian relief 
simulator applied to an artificial scenario involving an earthquake and volcanic eruption 
in Sicily, demonstrated the utility and versatility of the methodology. We were able to 
identify the most important input variables, and their interactions, using the lightweight 
emulator. While the GP emulator was more accurate, the lightweight emulator was more 
scientifically intuitive and informative. The technology in this paper provides the ca¬ 
pacity for our collaborators to efficiently explore “what-if” questions and to make faster 
“in-theatre” decisions. 
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Extensions of the methodology to allow the constrnction and model-checking of different 
emnlators are possible. In Section only weakly informative prior distribntions were 
assnmed. If more informative prior information was available, this conld be incorporated 
into both lightweight and GP emnlators, for example via the prior distribntion for the 
regression parameters i?|S. It is likely that the nse of snch information wonld lead to a 
smaller difference in predictive accnracy between the two emnlators, provided there was 
not a conflict between the prior information and the simnlator. 


Diagnostics for mnltivariate emnlators were also employed by Fricker et al. (2013) in a 
nnmber of case stndies nsing models with a general class of non-separable covariance 
strnctnre. These diagnostics were similar in spirit to those of Bastos and O’Hagan (2009) 
bnt, for example, the non-separability prevents analytic marginalisation across any of the 
scale parameters when calcnlating the eqnivalent to the omnibns statistic (10). An alter¬ 
native non-separable model may be constrncted as the fnll posterior distribntion nnder 
model nncertainty, see Section |3.2.1[ The model-averaged posterior predictive distribn¬ 


tion is then a mixtnre of matrix t-distribntions; see also Rongier (2007), who proposed 
a mixtnre of matrix normal inverse Wishart joint prior distribntions for B and S. The 
diagnostics described in Section 3T are straightforward to extend to mixtnre distribntions 
by averaging over the components of the mixtnre using simulation. 
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